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B cells utilize three DNA alteration strategies— V(D)J 
recombination, somatic hypermutation (SHM) and class switch 
recombination (CSR) — to somatically mutate their genome, 
thereby expressing a plethora of antibodies tailor-made against 
the innumerable antigens they encounter while in circulation. 
Of these three events, the single-strand DNA cytidine 
deaminase, Activation Induced cytidine Deaminase (AID), is 
responsible for SHM and CSR. Recent advances, discussed in 
this review article, point toward various components of RNA 
polymerase II "stalling" machinery as regulators of AID activity 
during antibody diversification and maintenance of B cell 
genome integrity. 



Antibody Diversification Mechanisms in 
B-lymphocytes 

Ever since Frank Macfarlane Burnet proposed the clonal selec- 
tion theory to account for antibody diversification, 1 questions 
have been raised about the molecular mechanism (s) by which 
the vast reservoir of genetically distinct antibody expressing genes 
are generated as part of the adaptive immune system. B cells that 
express these antigen-specific antibodies are generated by mul- 
tiple steps of selection as they arise in the bone marrow and subse- 
quently mature in peripheral germinal centers. 2 In this review we 
concentrate on the molecular mechanism (s) by which mutations 
are incorporated in the variable (V) genes and constant region 
switch sequences (IgS), leading to B cells that express antigen- 
recognizing and/or effector function capable antibodies. 

Upon exposure to antigen, B cells undergo two types of con- 
trolled DNA mutation. Through somatic hypermutation (SHM) 
B cells incorporate point mutations in their variable gene exons 
to encode antigen-specific antibodies, while class switch recom- 
bination (CSR) rearranges the constant region genes, permitting 
them to express constant region exons downstream of IgM (e.g., 
the IgG series, IgE, IgA). Immunoglobulins are composed of 
two distinct polypeptides, the heavy chain and the light chain. 
The heavy chain is encoded by the immunoglobulin heavy chain 
locus (IgH), the light chain (IgL) by one of two separate loci, 
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IgK and IgA.. IgH undergoes CSR as well as SHM whereas the 
IgL loci only undergo SHM. In this review, we focus on genetic 
alterations in IgH, predominantly because more analysis has been 
performed on these sequences with respect to the mutagenesis 
process involving DNA single-strand and double-strand break 
formation and repair during SHM and CSR, respectively. 2 " 5 A 
schematic representation of the antibody-expressing IgH locus is 
shown in Figure 1. 

Following antigen exposure in the germinal center, B cells 
proliferate and undergo CSR and SHM. 2 Transcriptional activa- 
tion of various regions of the IgH locus leads to epigenetic and 
structural changes of genes in the coding and non-coding regions 
of the variable and constant regions. Such changes are required 
for the B cell mutator Activation Induced cytidine Deaminase 
(AID) to access these substrate DNA sequences. 6 ' 7 Various lines 
of evidence, previously summarized, 7 have established that AID 
mutates single-stranded (ss) DNA in vitro and potentially in vivo 
as well. These studies have demonstrated that AID deaminates 
deoxycytidine residues (dC) to deoxyuridines (dU) which are 
then either repaired as deoxycytidines, replicated through dur- 
ing DNA replication to introduce a deoxythymidine (dT) in one 
daughter cell, or converted to dA, dG or dT by the coordinated 
actions of the cellular base excision repair and mis-match repair 
machinery. A schematic representation of the mutagenesis of 
an AID target dC residue is shown in Figure 2, where it is also 
explained how neighboring residues of AID-deaminated dCs can 
be subjected to mutagenesis due to the action of the error-prone 
DNA polymerase, polm,. 8 Other published reviews provide a bet- 
ter and more detailed overview of the mechanisms that govern 
the specifications and behavior of AID-induced mutagenesis 
shown in Figure 2. 9-11 

AID will mutagenize DNA only when the DNA is stabilized 
in a single stranded conformation (ssDNA). In this context, 
recent advances in genome-sequencing technologies have permit- 
ted a better understanding of potential AID targets in the IgH 
locus and the remainder of the genome of normal and malignant 
B cells, thereby permitting the investigation of how ssDNA AID 
substrates are generated at such a wide variety of DNA sequences 
that do not have any obvious characteristics identified with 
ssDNA-generating structures. 5,12 It is likely that AID utilizes 
various transcription-associated events and co-factors to identify 
and generate its ssDNA substrates. In mouse models in which 
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Figure 1. Class switch recombination at the Immunoglobulin Heavy Chain locus. (A) The configuration of the unrearranged immunoglobulin heavy 
chain locus in immature B cells according to NG_005838.1. Here, V , D H and J H represent the various unrearranged gene segments that will generate 
the VDJ exon following V(D)J recombination and are followed by the various constant regions genes (Cjji-a, yellow boxes). Each constant region gene 
is preceded by a switch sequence (Sjji-a, black oval); switch sequences are non-coding regions transcribed by their own transcriptional regulatory pro- 
moter elements. Two important regions that have enhancer functions and influence various recombination events in the IgH locus are shown in a blue 
box and are labeled as Ejul and Ea (also known as 3' regulatory region, 3'RR). (B) Following V(D)J recombination, various B cell signaling pathways in- 
duce transcription at switch sequence promoter regions. Transcription at the upstream switch sequence Sjjl is constitutive whereas transcription at the 
downstream switch sequence (in this case, S-y1) is induced due to activation of its promoter elements by various signaling pathways. (C) A schematic of 
a simplified IgH locus that is poised to undergo CSR to lgG1 following transcription activation at S^l and S-y1. A region of the switch sequences, known 
as the core switch region (G-rich on the non-template strand), is capable of forming stable RNA/DNA hybrids that lead to ssDNA structure "R-loop" 
formation. (D) Transcription at switch sequences induces formation of R-loops which become targets for AID activity. AID converts cytidine residues to 
uracils, that are then recognized by the base excision pathway uracil DNA deglycosylase (UNG). (E) UNG activity induces generation of abasic residues 
that are then cleaved by the apurinic endonuclease family of proteins (APE1/2) to generate DNA double strand breaks (DSBs) at both upstream (Sjjl) 
and downstream (Sx, in this case, S7I) switch sequences. (F) Recognition of these two DSBs by two cellular DNA damage repair pathways known as 
non homologous end-joining (NHEJ) and alternative end joining (AEJ) leads to joining of the two distant switch sequences that have DSBs leading to 
the completion of CSR. (G) The final configuration of the antibody heavy chain molecule coding mRNA is shown. 



ssDNA generating R-loop structures are perturbed in the switch 
regions of the IgH locus 13,14 or where recruitment of ssDNA struc- 
ture stabilizing protein (such as Replication protein A (RPA)) is 
affected, 15 CSR is significantly reduced. In this review we explore 
the postulated roles of RNA polymerase II-associated AID cofac- 
tors in the generation of ssDNA substrates for AID. 

DNA Motifs that Influence AID'S Activity 

AID deamination hotspot RGYW motif. 

Two non-exclusive signature motifs in the switch sequences of 
the IgH locus have been identified as influencing AID activity. 
Ig switch sequences (collectively referred as IgS) contain RGYW 
motifs (where R is a purine, G is a guanine, Y is a pyrimidine 
and W is A or T), and have R-loop (a secondary, stable DNA 



structure) forming sequences caused by G-rich motifs on the 
non-template strand. The RGYW motif rich sequences are found 
to be embedded in secondary structure forming G-rich IgS 
region DNA sequences. As an approximate estimate, the mouse 
S(x region contains 3.5 kb of repetitive sequence and the down- 
stream switch regions S7I, S"y3, S72b, Se and Sot have 6.5 kb, 
2.0 kb, 2.7 kb, 2.0 kb and 4.0 kb of repetitive elements, respec- 
tively, all rich in the RGYW motif. Other species, from humans 
to amphibians (xenopus) to birds (chicken) also have repetitive 
sequences in their switch regions. Switch sequences in these spe- 
cies are either comprised of the RGYW motif (as in Xenopus) 
or contain large R-loop structure generating sequences that are 
embedded and flanked with RGYW-rich motifs (as in human 
and mouse). 16 While switch sequences have high RGYW density, 
oddly, variable region sequences are not particularly enriched for 
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Figure 2. Fate of cytidine residues following AID mediated DNA deamination. AID deaminates cytidine residues to uracils that are identified by the 
cellular base excision repair pathway (UNG) or the mismatch repair pathway (MSH2/MSH6) for repair. Neighboring residues (A/T based pairs) may be 
mutated in this process of DNA lesion repair that depends upon the DNA polymerase r\. Multiple possibilities exist explaining how a lesion could be 
repaired. Based on the activity of error prone DNA polymerases (DNA pol t)), change of the neighboring A/T based pair could be to T/A or G/C or C/G 
base pairs. The nascently formed uracil residues are substrates of the apurinic endonucleases (APE1/2), and this reaction eventually leads to creation 
of single-strand DNA (ssDNA) nicks. ssDNA nicks on both strands of switch sequences can generate DNA double strand breaks that are intermediates 
during CSR. 



RGYW content compared with the remainder of the genome. 16 
Even so, AID prefers to mutate transcribed DNA substrates rich 
in RGYW motifs in vitro, indicating that this motif is a pre- 
ferred AID target. 7 The reason (s) why this motif attracts and/ 
or stimulates AID activity is not understood beyond some cor- 
relative studies based on AID activity at RGYW motif dense 
DNA sequences. 14,17 " 19 Limited knowledge has been obtained 
from cell-free extract experiments that have demonstrated robust 
AID activity on RGYW-rich sequences, having employed naked 
DNA transcribed by a viral T7 promoter or a bacterial RNA 
polymerase. 7 ' 14,17,20 Other experiments using chimeric AID pro- 
teins (generated with amino acid substitutions from APOBEC 
active sites) have revealed a role of amino acids neighboring the 
active site of AID (amino acids 115—123) in recognizing RGYW 
motifs. 21 " 23 In the future, the crystal structure of AID bound with 
RGYW motifs will conclusively illuminate how resident motifs 
in AID generate RGYW-specificity. 

Single-stranded DNA structures that attract AID. 

DNA secondary structures that are generated co-transcrip- 
tionally are potential genome wide substrates for AID, specially if 
they are stabilized due to their sequence inherent properties (e.g., 
G-richness) or due to associated DNA binding proteins. It was 
recently observed that AID-initiated DNA double-strand breaks 
are predominantly located in proximity to transcription start 
sites genome wide. 24 " 26 Thus, from small transcription bubbles 
that accompany the RNA polymerase II (RNAP II) transcrip- 
tion complex to large ssDNA structures such as R-loops and 
G-quadruplex structures, all DNA secondary structures that are 
generated by transcription-dependent mechanisms are potential 
targets of AID. 13,27 DNA secondary structures can be influenced 
by various factors such as ion concentrations in the cell, binding 



of proteins that stabilize DNA secondary structures, purine 
and pyrimidine distribution that permits formation of R-loops 
or the four stranded DNA structure called "G-quadruplex" or 
an i-motif. 2s G-quadruplex structures have various functions 
including protection of chromosome ends and regulation of gene 
expression through their influence on transcription initiation. 
The AID target c-Myc locus forms such structures to recruit 
transcriptional activator and non-duplex DNA binding factors 
NH23-H2 and hnRNP-K, which are known to stimulate c-Myc 
transcription. 29,30 

Unlike G-quadruplexes, R-loops are well known secondary 
DNA structures that can be targeted by AID. R-loops can be of 
various types. Transcriptionally active RNAP II complexes can 
generate R-loops of approximately 8-10 base pairs within the 
transcription bubble. On their own, these small R-loops may not 
induce robust somatic hypermutation. However, a low frequency 
of AID-induced mutagenesis could possibly occur at these tran- 
scription complex coupled DNA bubbles. It is also possible that 
negative DNA supercoils generated preceding the transcribing 
RNAP II are a source of ssDNA structures that are converted 
to targets of AID. 19,31 Negatively supercoiled DNA bubbles may 
utilize additional co-factors at canonical AID target sequences 
to stabilize ssDNA structures and stimulate robust AID activ- 
ity. Deviations in RNA processing, RNA splicing, replication 
and/or RNAP II transcription pre-termination pathways can 
also induce spreading of the transcription bubble R-loop into 
a larger ssDNA structure which can then act as a better AID 
substrate. As a model example in a heterologous system, it has 
been demonstrated that depletion of the THO complex, a co- 
transcriptional RNA processing pathway component, increases 
the levels of AID-mediated mutations on the single stranded 
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non-template DNA strand at R-loops in S. cerevisiae. 32,33 These 
mutations are generated due to the slow kinetics of ribonucleo- 
protein complex (RNP) formation that associate with the tran- 
scription complex-coupled nascent transcript, which is then able 
to hybridize with the template strand of the transcribed DNA to 
facilitate stable R-loop formation on the DNA. In B cells, simi- 
lar R-loops are generated in immunoglobulin switch sequences, 
although here the transcription associated small transcriptional 
bubble is converted to large R-loop structures by the inherent 
nature of the DNA sequence that contains large stretches of 
G-rich sequences. Some of these AID target switch sequences, 
IgS, now have been extensively characterized via in vitro and in 
vivo studies and shown to form R-loops due to the G-richness 
of the sequence. 13,27,34 Specifically, the largest switch sequence, 
IgS^l, has been experimentally demonstrated to generate long 
stretches of ssDNA R-loop structures using sodium bisulphite 
crosslinking/DNA sequencing based assays; furthermore, these 
regions are direct targets of AID and found to be recombined 
at high frequency with downstream switch sequences during 
CSR. 27,34 In support of these observations, genetic inversion of 
IgS'vl sequence leads to loss of co-transcriptional R-loop forma- 
tion and subsequent defects in CSR efficiency in mouse B cells. 13 
Taken together, these observations provide compelling evidence 
that ssDNA R-loop structures are robust AID targets sequences 
in B cells. Given that DNA/RNA hybrids present in R-loops may 
also alter the landscape with respect to histone position and sta- 
tus and also control transcription rates, 35 it is likely that R-loops 
play additional roles beyond generating ssDNA to support AID 
activity in the Ig locus. In the next sections, we will discuss the 
many means by which the transcriptional machinery promotes 
formation of DNA secondary structures and regulates AID activ- 
ity in B cells. 

AID Regulation Due to Association with the RNA 
Polymerase II Complex 

Before the advent of genome sequencing technology, it was 
assumed that AID only mutates the variable region genes and 
switch sequences of the immunoglobulin loci. This assump- 
tion existed due to the argument that AID-generated mutations 
are deleterious to the genomic integrity of cells and thus there 
should be a factor that only promotes AID mutations in the Ig 
locus. However, recent studies utilizing genome sequencing tech- 
nologies have generated a detailed map of AID target sequences 
genome wide and these studies have also correlated AID associa- 
tion with levels of mutagenesis at transcribed genes in the IgH 
locus as well as at other genes. 25,36 Chromatin immunoprecipi- 
tation studies of AID and components of transcribing RNAP 
II in B cells indicate that RNAP II, in addition to generating 
secondary DNA structures for AID, may directly contribute 
toward AID recruitment. Many protein factors and AID modi- 
fication events have been postulated to generate secondary DNA 
structures and/or target AID to regions of the B cell genome. A 
summary of these AID regulatory events is schematically repre- 
sented in Figures 3A and B and described in Figure 3C. Detailed 
descriptions of direct AID regulatory events have already been 



extensively discussed previously. 37,38 In the following sections we 
discuss AID's interaction with various states of RNAP II with 
which AID may bind and how these AID/RNAP II complexes 
determine somatic mutagenesis in the B cell genome. 

Promoter proximal stalling of RNA polymerase II. 

Two related methods have demonstrated that AID can bind 
to various transcribed genes genome wide and induce mutagen- 
esis in some of these sequences. Chromatin immunoprecipita- 
tion (ChIP) of AID from CSR stimulated B cells followed by 
high throughput sequencing of AID-associated DNA fragments 
(ChlP-seq) demonstrated that AID can bind to various regions 
of the B cell genome. Moreover, AID bound sequences have 
high occupancy by RNA polymerase II, some of this RNAP II 
is in the elongation phase with the remainder in the paused or 
stalled conformation. Consistent with these observations, it has 
been reported that RNAP II is enriched at various regions of IgS 
sequences 39 and these IgS sequences are enriched with AID and 
the RNAP II "stalling" factor Spt5. 36 ' 40 For a deeper interpreta- 
tion of RNAP II occupancy results at AID target sequences one 
requires an understanding of the transcription machinery and 
the various mechanisms that regulate the transcription complex. 

The RNAP II transcription complex initiates transcription 
from transcription start sites and undergoes "promoter escape" 
mostly at +1 nucleotide downstream of the transcription start site 
(TSS). Promoter escape is dependent upon the RNAP II asso- 
ciated factor TFIIH that rides the RNAP II complex into the 
elongation phase, and on the phosphorylation of the C-terminal 
tail of RNAP II at residue serine-5, S5. Following promoter 
escape, RNAP II encounters a second rate-limiting step before 
entering transcription elongation, this step is known as "RNAP 
II stalling" and more accurately termed as "promoter-proximal 
transcription stalling (PPTS)." Promoter-proximal transcription 
stalling is known to "poise" the transcription complex in such a 
way that following release of the stalled state the transcription 
complex rapidly can enter the elongation phase. 41 The physio- 
logical signals that induce release of the paused state of the tran- 
scription complex include stress, environment-induced signaling 
cascades, and development. At a molecular level, the poised 
transcription complex contains additional cofactors, namely the 
NELF complex and the transcription stalling factor DSIF (com- 
posed of Spt4 and Spt5). Following phosphorylation of NELF, 
DSIF and the C-terminal tail of RNAP II at serine residue S2 
by positive transcription elongation factor P-TEFb and release of 
phosphorylated NELF, the transcription complex enters the elon- 
gation phase of transcription. 41 " 43 

Multiple fates of the RNA pol II and the associated transcript 
are possible when it is in the stalled state. Most often the RNA 
pol II readjusts itself to keep the RNA transcript aligned with the 
transcribing RNAP II via a process known as RNAP II "back- 
tracking." RNAP II undergoing backtracking requires activity of 
transcript cleavage factor TFIIS to induce internal cleavage of 
the RNA by the polymerase active site, create a new 3'-end that 
is properly aligned, and continue transcription. RNAP II stall- 
ing may provide regulatory advantages during rapid gene expres- 
sion. One evolving hypothesis is that paused RNAP II has been 
optimized by evolution to promote rapid activation of RNAP II 
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Figure 3. For figure legend, see page 132. 
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Figure 3. (See previous page) Subcellular control of AID activity by various post-transcriptional regulatory pathways. AID mRNA is recognized by two 
miRNAs (miR-155, miR-181) that bind to its 3'UTR to regulate AID mRNA translational efficiency and prevent AID hyperactivity caused by its overexpres- 
sion. 55 58 One proposed mode of cytoplasmic AID protein regulation is the chaperone complex HSP90 59 and EFIa, 60 controlling AID's ability to translo- 
cate into the nucleus of B cells. AID uses its nuclear localization signal (NLS) to translocate into the nucleus where its steady-state nuclear protein levels 
are further controlled by another chaperone, REG7. 61 AID translocated in the nucleus can have multiple fates that include its ubiquitination 6263 and its 
phosphorylation at various serine, threonine, or tyrosine residues (see (B) for details of known phosphorylation sites 64 6S ). Phosphorylated AID forms 
a complex with its cofactors 14-3-3 69 and RPA 17 ™ and PTPBP2 71 and binds to the stalled RNA polymerase II complex (marked by RNAP II stalling marker 
Spt5) 40 at AID target sequences, where it interacts with the DNA/RNA hybrid and the 3'-5' RNA exonuclease, RNA exosome. 48 It is postulated that the 
macromolecular complex RNA exosome provides AID the ability to deaminate both strands of its DNA substrate by processing the RNA present in the 
RNA/DNA hybrid associated with the transcription complex. The DNA DSBs in the immunoglobulin switch regions are intermediates that ultimately 
are utilized by the cellular DNA DSB response factors to complete CSR. (B) Schematic representation of AID phosphorylation sites along with AID's 
cytidine deaminase domain, nuclear localization signal (NLS), APOBEC-like region and nuclear export signal (NES) motif. (C) A detailed chart of various 
regulatory elements of AID that are known to directly control its activity. The protein factors and AID modifications are schematized in (A). The mRNA 
stability of AID is regulated by various miRNAs, as indicated in (C). 5558 



at genes that are developmentally regulated and require instanta- 
neous "off-on" switches. The mechanisms behind RNAP II stall- 
state release are still being unraveled but chromatin modifications 
and chromosomal positioning are factors that are reported to 
control these fates. 44 " 46 Secondary DNA structures like R-loops, 
G-quartets or repetitive DNA sequences may also cause RNA pol 
II stalling. However, secondary DNA structure-induced RNAP 
II stalling, as would be expected in IgS sequences, may not occur 
at promoter proximal regions of the transcribed DNA element but 
rather at sequences that are substantially downstream of the tran- 
scription start site. Thus, these stalled transcription complexes 
may attract the RNA pol II pre-termination complex that facili- 
tates premature transcription termination. It can be postulated 
that pretermination stalled RNAP II recruits AID at regions that 
are distal from the promoter of the target sequences and initiate 
mutagenesis. In the next section, we discuss the RNAP II pre- 
termination complex and how it provides AID with its necessary 
co-factor RNA exosome to stimulate DNA deamination. 

RNA polymerase II premature termination. 

Transcription complexes "stall" at regions proximal to the pro- 
moter or during the elongation phase but resolve the paused phase 
to continue transcribing (Fig. 4A); however, at times due to various 
factors including environmental stress, DNA sequence context, or 
other reasons a stalled RNAP II does not continue transcribing 
further and undergoes premature transcription termination (Fig. 
4B). The factors that determine RNAP II elongation competence 
and prevent premature termination following polll's stalling are 
not completely understood. However, efficiency of RNAP II 
backtracking resolution or its ability to undergo "RNAP II bubble 
expansion" are some mechanisms that may allow it to bypass ter- 
mination and continue on the path of transcription elongation. 
The mechanism of RNAP II termination at protein coding genes, 
at non-coding genes, and on sequences undergoing premature ter- 
mination are very different. In these three scenarios the mecha- 
nism of 3' end processing of the nascent transcript determines 
whether transcription termination will occur. 

In contrast to conventional termination, premature transcrip- 
tion termination requires a separate set of proteins that promote 
the removal of the nascent transcript from the template DNA 
associated RNAP II complex. In yeast, the TRAMP (Trf4/5, 
Airl/2, and Mtr4) complex utilizes its co-factor Nrdl complex 
(Nrdl-Nab3-Senl) to bind RNAP II-CTD and recruit the 3-5' 
RNA exonuclease known as the RNA exosome complex that has 



the ability to degrade the prematurely terminated nascent RNA. 
RNA exosome is a part of the RNA surveillance machinery of 
eukaryotic cells. The function of the TRAMP complex is to add 
short polyadenylation signals to RNA exosome substrate nascent 
RNA and/or recognize secondary RNA structures that facilitate 
RNA exosome recruitment. Recent developments, as discussed 
below, postulate that the pre-termination state of RNAP II may 
be responsible for recruiting AID co-factors and facilitating 
DNA deamination activity in B cells. These possibilities are sche- 
matized in Figure 4. For more details of the various mechanisms 
that control transcription termination, please see the reviews by 
Manley et al. 43 ' 47 

RNA polymerase II associated RNA degradation. 

In a recent study it was reported that the RNA exosome com- 
plex binds with AID in B cells. Moreover, RNA exosome recruit- 
ment to the IgH locus depends upon the activity of AID, since AID 
deficient B cells demonstrate reduced RNA exosome recruitment 
to transcribed switch sequences. 48 Analysis of B cell lines that 
can inducibly undergo CSR demonstrates a significant decrease 
in CSR levels following stable small hairpin RNA (shRNA) 
induced knockdown of RNA exosome subunits. In in vitro 
assays, purified mammalian RNA exosome complex or recombi- 
nant RNA exosome complex is able to stimulate AID mediated 
DNA deamination activity on the template and the non-template 
strands of transcribed DNA substrates, thus providing evidence 
that the RNA exosome complex is a functional co-factor of AID 
activity. 48 In light of these observations, it can be postulated that 
AID utilizes the RNA exosome associated RNAP II complex to 
induce genome mutations. Given that AID-generated mutations 
can occur at varying distances from the transcription start site, it 
is likely that the composition of the RNAP II complex is differ- 
ent when AID is mutating its target sequence within the first 100 
base pairs (stalled due to promoter proximal "pausing/stalling") 
vs. kilobases downstream of the transcription start site (stalled 
following pre-termination pausing events). A simplified model 
would predict that in IgH switch sequences, where AID mutates 
its targets at regions kilobases downstream of the transcription 
start sites, transcriptional paused complexes are not promoter 
proximal and are potentially induced by transcription pre-ter- 
mination caused by the presence of secondary DNA structures 
(R-loops), repetitive sequences (RGYW-motifs in the Xenopus 
switch sequence) and/or the presence of chromatin bound protein 
factors (Fig. 4B). Indeed, AID deamination motif-rich sequences 
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Figure 4. AID in the transcription complex. AID, RNA exosome and Spt5 can associate with transcribing RNA polymerase II in two distinct transcription 
complexes. (A) Following transcription initiation, RNA polymerase II at many transcribed genes enters a phase called "transcription stalling." The as- 
sociation of RNAP II co-factors NELF and Spt5 and TFIIH leads to the resolution of RNAP II stalling and promotes RNAP II to transition into its transcrip- 
tion elongation phase. AID, by virtue of its association with Spt5, accesses this complex and its associated DNA. Moreover, RNA exosome is recruited 
to this complex and can associate with AID and Spt5-associated RNAP II to promote DNA deamination during somatic hypermutation in the first 
100-200 base pairs downstream of the transcription start site. However, during SHM as well as in CSR, it is possible that the RNAP II is able to overcome 
the promoter proximal stalling and enter the elongation phase. Here, if it encounters dU residues (incorporated by preceding RNAP ll-associated AID 
complexes), it may stall and recruit RNA exosome and catalyze robust cytidine deamination at DNA regions kilobases downstream from the transcrip- 
tion start sites. (B) If the transcription complex does continue into the elongation phase, it may generate stable DNA secondary structures like R-loops 
depending upon the physical properties of the transcribed DNA. For example, transcription of switch sequences is proposed to generate stable large 
R-loop structures due to the presence of G-richness on the template strand. If R-loops or other transcription complex impedance factors are recruited 
(or preexist), elongating RNAP II molecules that are loaded on the DNA secondary structure containing templates may undergo a second "stalling" 
event that is analogous to RNAP II pre-termination. In this pre-termination complex RNA exosome actively degrades the nascent transcript to prevent 
continuity of abortive transcription and aberrant DNA/RNA hybrids that can initiate genomic instability. This RNA exosome-associated RNAP II preter- 
mination complex can potentially associate with AID and provide another hub for AID and its associated co-factors to catalyze DNA deamination. This 
scenario parallels the transcription stalling associated AID DNA deamination activity observed during class switch recombination. 



are embedded in R-loop structures that pause RNAP II. 35 Thus 
an RNAP II complex located in switch regions may have two dis- 
tinct functions. First, it transcribes switch sequences to generate 
DNA secondary structures such as R-loops (first function). Once 
an R-loop is stabilized, subsequent RNAP II complexes tend to 
stall and recruit associated factors such as RNA exosome (second 
function) (for details of pre-termination complex composition, 
please see 43,47 ). This stalled complex may now recruit AID and its 
co-factors to facilitate DNA deamination induced double-strand 
breaks. In Figure 3, we have schematically represented the vari- 
ous cofactors of AID that ultimately support its binding in the 
RNAP II complex but a complete understanding of how all these 
factors collectively control AID activity is emerging. 38,49 

Mutations in the variable region sequences show a different 
distribution, they are either promoter proximal or distributed 
between 100 base pairs and two kilobases downstream of the IgV 
transcription start site. How does such a distribution occur, given 
that IgV sequences do not have any particular sequence motif, 
similar to R-loops in heavy chain switch sequences, that can 
induce RNA polymerase II stalling? It is possible that promoter 
proximal stalling is the mechanism that leads to accumulation 
of RNAP II at variable region sequences in the first 50 to 200 



base pairs downstream of the transcription start site. However, 
the mutations that occur in IgV sequences or at other genomic 
sequences far beyond promoter proximal distances require a 
more satisfying explanation for AID targeting. Multiple possi- 
bilities exist including AID binding and traveling with the stalled 
RNAP II to downstream DNA sequences (where the complex 
may encounter pre-termination events that recruit RNA exosome 
and promote DNA deamination) or the induction of low levels 
of DNA deamination events, induced by mechanisms such as 
DNA supercoiling, that lead to incorporation of deoxyuracils in 
the DNA in a distributive fashion downstream of the promoter 
proximal stalling sites. It can be speculated that the presence of 
these AID-generated deoxyuracils in the DNA template strand 
may produce stalling of the subsequent transcription complex 
as it attempts to utilize these non-canonical dU bases as a tran- 
scription template at regions that are not promoter proximal AID 
targets (Fig. 4A). The stalled RNAP II may now recruit RNA 
exosome and, if associated with AID, catalyze further mutagen- 
esis of the V genes. Thus, RNAP II stalling and AID mediated 
DNA deamination may stimulate one another, possibly forming 
a regulatory feedback loop to promote robust mutagenesis at Ig 
loci in B cells. 
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Concluding Remarks 

AID mutations genome-wide are a source of genomic instability 
and chromosomal translocations. Various studies that addressed 
single gene translocations or more recent ones that have evalu- 
ated chromosomal translocations genome-wide have consistently 
established the role of AID in causing genomic instability. 24 " 26 ' 50 " 54 
All these studies clearly establish a relationship between AID, 
transcription, RNA pol II regulation proximal to or distal from 
transcription start sites, and genome maintenance. Future studies 
should now focus on revealing the environmental factors in the 
germinal center, epigenetic regulation of AID target genes and 
nuclear organization into transcription control regions, and AID 
mutation repair factories in prevention of initiation of oncogen- 
esis. With evolving cutting edge technologies that can sequence 



vast genomes at high resolution or microscopically monitor 
nuclear organization in minute detail, following AID generated 
mutations in B cells may provide answers to questions that have 
a direct implication on many poorly understood aspects of mam- 
malian gene expression biology. 
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